File-type Identification with Incomplete Information

نویسندگان

  • Siddharth Gopal
  • Yiming Yang
  • Konstantin Salomatin
  • Jaime Carbonell
چکیده

File-type Identification (FTI) is an important problem in digital forensics, intrusion detection, and other related fields. Using stateof-the-art classification techniques to solve FTI problems has begun to receive research attention; however, general conclusions have not been reached due to the lack of thorough evaluations for method comparison. This paper presents a systematic investigation of the problem, algorithmic solutions and an evaluation methodology. Our focus is on performance comparison of statistical classifiers (e.g., SVM and kNN) and knowledgebased approaches, especially COTS (Commercial Off-The-Shelf) solutions which currently dominate FTI applications. We analyze the robustness of different methods in handling damaged files and file segments. We propose two alternative criteria in measuring performance: 1) treating file-name extensions as the true labels, and 2) treating the predictions by knowledge based approaches on intact files; these rely on signature bytes as the true labels (and removing these signature bytes before testing each method). In our experiments with simulated damages in files, SVM and kNN substantially outperform all the COTS solutions we tested, improving classification accuracy very substantially – some COTS methods cannot identify damaged files at all. Our experiments also show the scalability of SVM and kNN to large applications after adequate feature selection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

اثر اطلاعات پدری گمشده برپیشرفت و روند ژنتیکی صفت کمی با استفاده از شبیه سازی رایانه ای

In order to study the effect of incomplete sire's pedigree on genetic trend (bBv,y) and gain (R) of quantitative trait, two population were simulated with the heritability 0.15 and 0.30. For each population, information resulted from ten years of selection were saved in different files. In generated data files, the sire numbers were eliminated from pedigree file with 0, 10, 20, …, 100 percentag...

متن کامل

SÁDI - Statistical Analysis for Data Type Identification

A key task in digital forensic analysis is the location of relevant information within the computer system. Identification of the relevancy of data is often dependent upon the identification of the type of data being examined. Typical file type identification is based upon file extension or magic keys. These typical techniques fail in many typical forensic analysis scenarios such as needing to ...

متن کامل

Extensions of the UNIX File Command and Magic File for File Type Identification

File format identification is a core requirement for digital archives. The UNIX file command is among the most promising technologies for file type identification. This report describes extensions to the file command and magic file that enhance their utility for file format identification in archival systems. A File Format Library (database) has been created to manage information about file for...

متن کامل

Incidence of incomplete excision in surgically treated basal cell carcinomas and identification of the related risk factors

Background: Surgery is the most frequent treatment modality for basal cell carcinoma but in spite of its high cure rate, the frequency of incomplete excision varies widely (0.7-50%) among dermatologic centers. Our case series was designed to determine the frequency of incompletely excised basal cell carcinoma and the related risk factors. Methods: A total of 1424 basal cell carcinoma (1040 pati...

متن کامل

Sliding Window Measurement for File Type Identification

Knowing the file type associated with a sequence of bytes makes interpretation of those bytes far more meaningful. With the ever increasing number of file types in existence and the massive storage capacity of modern hardware, it is impractical to try interpreting a sequence of bytes as every known file type until one succeeds. Furthermore, some file types require specific header or footer info...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011